In this project, you will build a neural network of your own design to evaluate the CIFAR-10 dataset. Our target accuracy is 70%, but any accuracy over 50% is a great start. Some of the benchmark results on CIFAR-10 include:
78.9% Accuracy | Deep Belief Networks; Krizhevsky, 2010
90.6% Accuracy | Maxout Networks; Goodfellow et al., 2013
96.0% Accuracy | Wide Residual Networks; Zagoruyko et al., 2016
99.0% Accuracy | GPipe; Huang et al., 2018
98.5% Accuracy | Rethinking Recurrent Neural Networks and other Improvements for ImageClassification; Nguyen et al., 2020
Research with this dataset is ongoing. Notably, many of these networks are quite large and quite expensive to train.
## This cell contains the essential imports you will need – DO NOT CHANGE THE CONTENTS! ##
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
from PIL import Image
import numpy as np
!pip install plotly
import plotly.graph_objects as go
Requirement already satisfied: plotly in c:\users\2g5318897\appdata\local\miniconda3\lib\site-packages (5.15.0) Requirement already satisfied: tenacity>=6.2.0 in c:\users\2g5318897\appdata\local\miniconda3\lib\site-packages (from plotly) (8.2.2) Requirement already satisfied: packaging in c:\users\2g5318897\appdata\local\miniconda3\lib\site-packages (from plotly) (23.0)
Specify your transforms as a list first.
The transforms module is already loaded as transforms.
CIFAR-10 is fortunately included in the torchvision module.
Then, you can create your dataset using the CIFAR10 object from torchvision.datasets (the documentation is available here).
Make sure to specify download=True!
Once your dataset is created, you'll also need to define a DataLoader from the torch.utils.data module for both the train and the test set.
# Define transforms
data_dir = 'Cat_Dog_data'
# TODO: Define transforms for the training data and testing data
# Pass transforms in here, then run the next cell to see how the transforms look
train_transforms = transforms.Compose([transforms.RandomRotation(30),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
test_transforms = transforms.Compose([transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
# Create training set and define training dataloader
train_data = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=train_transforms)
trainloader = torch.utils.data.DataLoader(train_data, batch_size=64, shuffle=True, num_workers=2)
# Create test set and define test dataloader
test_data = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=test_transforms)
testloader = torch.utils.data.DataLoader(test_data, batch_size=64, shuffle=False, num_workers=2)
# The 10 classes in the dataset
classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
Files already downloaded and verified Files already downloaded and verified
Using matplotlib, numpy, and torch, explore the dimensions of your data.
You can view images using the show5 function defined below – it takes a data loader as an argument.
Remember that normalized images will look really weird to you! You may want to try changing your transforms to view images.
Typically using no transforms other than toTensor() works well for viewing – but not as well for training your network.
If show5 doesn't work, go back and check your code for creating your data loaders and your training/test sets.
# somehow the original function makes my kernel crach systemtically, so I re-wrote it using PIL instead.
def show5(img_loader):
dataiter = iter(img_loader)
batch = next(dataiter)
labels = batch[1][0:5]
images = batch[0][0:5]
for i in range(5):
print(classes[labels[i]])
image = images[i].numpy()
image = image.transpose((1, 2, 0)) # Transpose the image dimensions
image = (image * 0.5) + 0.5 # Remove normalization
image = (image * 255).astype(np.uint8) # Convert to uint8
pil_image = Image.fromarray(image)
pil_image.show()
# def show5(img_loader):
# dataiter = iter(img_loader)
# batch = next(dataiter)
# labels = batch[1][0:5]
# images = batch[0][0:5]
# for i in range(5):
# print(classes[labels[i]])
# image = images[i].numpy()
# plt.imshow(image.T)
# plt.show()
# Explore data
show5(trainloader)
show5(testloader)
cat truck ship horse frog cat ship ship plane frog
Using the layers in torch.nn (which has been imported as nn) and the torch.nn.functional module (imported as F), construct a neural network based on the parameters of the dataset.
Feel free to construct a model of any architecture – feedforward, convolutional, or even something more advanced!
class SimpleCNN(nn.Module):
def __init__(self):
super(SimpleCNN, self).__init__()
self.conv1 = nn.Conv2d(3, 16, kernel_size=3, stride=1, padding=1)
self.relu1 = nn.ReLU()
self.pool1 = nn.MaxPool2d(kernel_size=2, stride=2)
self.conv2 = nn.Conv2d(16, 32, kernel_size=3, stride=1, padding=1)
self.relu2 = nn.ReLU()
self.pool2 = nn.MaxPool2d(kernel_size=2, stride=2)
self.fc1 = nn.Linear(32 *8* 8, 128)
self.relu3 = nn.ReLU()
self.fc2 = nn.Linear(128, 10)
def forward(self, x):
x = self.conv1(x)
x = self.relu1(x)
x = self.pool1(x)
x = self.conv2(x)
x = self.relu2(x)
x = self.pool2(x)
x = x.view(x.size(0), -1)
x = self.fc1(x)
x = self.relu3(x)
x = self.fc2(x)
return x
Specify a loss function and an optimizer, and instantiate the model.
If you use a less common loss function, please note why you chose that loss function in a comment.
model = SimpleCNN()
criterion = nn.CrossEntropyLoss() #combines the softmax activation and the negative log-likelihood loss into a single step
optimizer = optim.SGD(model.parameters(), lr=0.003, momentum=0.9)
Use whatever method you like to train your neural network, and ensure you record the average loss at each epoch.
Don't forget to use torch.device() and the .to() method for both your model and your data if you are using GPU!
If you want to print your loss during each epoch, you can use the enumerate function and print the loss after a set number of batches. 250 batches works well for most people!
epochs = 45
train_losses, test_losses = [], []
for e in range(epochs):
running_loss = 0
for images, labels in trainloader:
optimizer.zero_grad()
log_ps = model(images)
loss = criterion(log_ps, labels)
loss.backward()
optimizer.step()
running_loss += loss.item()
else:
test_loss = 0
accuracy = 0
# Turn off gradients for validation, saves memory and computations
with torch.no_grad():
model.eval()
for images, labels in testloader:
log_ps = model(images)
test_loss += criterion(log_ps, labels)
ps = torch.exp(log_ps)
top_p, top_class = ps.topk(1, dim=1)
equals = top_class == labels.view(*top_class.shape)
accuracy += torch.mean(equals.type(torch.FloatTensor))
model.train()
train_losses.append(running_loss/len(trainloader))
test_losses.append(test_loss/len(testloader))
print("Epoch: {}/{}.. ".format(e+1, epochs),
"Training Loss: {:.3f}.. ".format(train_losses[-1]),
"Test Loss: {:.3f}.. ".format(test_losses[-1]),
"Test Accuracy: {:.3f}".format(accuracy/len(testloader)))
Epoch: 1/45.. Training Loss: 1.897.. Test Loss: 1.588.. Test Accuracy: 0.423 Epoch: 2/45.. Training Loss: 1.515.. Test Loss: 1.400.. Test Accuracy: 0.493 Epoch: 3/45.. Training Loss: 1.387.. Test Loss: 1.297.. Test Accuracy: 0.538 Epoch: 4/45.. Training Loss: 1.288.. Test Loss: 1.197.. Test Accuracy: 0.580 Epoch: 5/45.. Training Loss: 1.214.. Test Loss: 1.111.. Test Accuracy: 0.611 Epoch: 6/45.. Training Loss: 1.160.. Test Loss: 1.091.. Test Accuracy: 0.617 Epoch: 7/45.. Training Loss: 1.121.. Test Loss: 1.020.. Test Accuracy: 0.646 Epoch: 8/45.. Training Loss: 1.078.. Test Loss: 1.010.. Test Accuracy: 0.647 Epoch: 9/45.. Training Loss: 1.043.. Test Loss: 0.993.. Test Accuracy: 0.653 Epoch: 10/45.. Training Loss: 1.009.. Test Loss: 0.978.. Test Accuracy: 0.663 Epoch: 11/45.. Training Loss: 0.987.. Test Loss: 0.927.. Test Accuracy: 0.679 Epoch: 12/45.. Training Loss: 0.968.. Test Loss: 0.932.. Test Accuracy: 0.677 Epoch: 13/45.. Training Loss: 0.941.. Test Loss: 0.890.. Test Accuracy: 0.688 Epoch: 14/45.. Training Loss: 0.922.. Test Loss: 0.896.. Test Accuracy: 0.689 Epoch: 15/45.. Training Loss: 0.902.. Test Loss: 0.879.. Test Accuracy: 0.691 Epoch: 16/45.. Training Loss: 0.889.. Test Loss: 0.877.. Test Accuracy: 0.692 Epoch: 17/45.. Training Loss: 0.867.. Test Loss: 0.861.. Test Accuracy: 0.696 Epoch: 18/45.. Training Loss: 0.857.. Test Loss: 0.822.. Test Accuracy: 0.712 Epoch: 19/45.. Training Loss: 0.839.. Test Loss: 0.824.. Test Accuracy: 0.712 Epoch: 20/45.. Training Loss: 0.836.. Test Loss: 0.828.. Test Accuracy: 0.710 Epoch: 21/45.. Training Loss: 0.822.. Test Loss: 0.809.. Test Accuracy: 0.717 Epoch: 22/45.. Training Loss: 0.804.. Test Loss: 0.827.. Test Accuracy: 0.710 Epoch: 23/45.. Training Loss: 0.799.. Test Loss: 0.790.. Test Accuracy: 0.721 Epoch: 24/45.. Training Loss: 0.788.. Test Loss: 0.818.. Test Accuracy: 0.713 Epoch: 25/45.. Training Loss: 0.777.. Test Loss: 0.786.. Test Accuracy: 0.727 Epoch: 26/45.. Training Loss: 0.767.. Test Loss: 0.808.. Test Accuracy: 0.716 Epoch: 27/45.. Training Loss: 0.758.. Test Loss: 0.772.. Test Accuracy: 0.731 Epoch: 28/45.. Training Loss: 0.749.. Test Loss: 0.762.. Test Accuracy: 0.735 Epoch: 29/45.. Training Loss: 0.740.. Test Loss: 0.779.. Test Accuracy: 0.729 Epoch: 30/45.. Training Loss: 0.738.. Test Loss: 0.759.. Test Accuracy: 0.736 Epoch: 31/45.. Training Loss: 0.724.. Test Loss: 0.768.. Test Accuracy: 0.736 Epoch: 32/45.. Training Loss: 0.719.. Test Loss: 0.763.. Test Accuracy: 0.738 Epoch: 33/45.. Training Loss: 0.710.. Test Loss: 0.785.. Test Accuracy: 0.730 Epoch: 34/45.. Training Loss: 0.702.. Test Loss: 0.762.. Test Accuracy: 0.732 Epoch: 35/45.. Training Loss: 0.704.. Test Loss: 0.768.. Test Accuracy: 0.737 Epoch: 36/45.. Training Loss: 0.692.. Test Loss: 0.778.. Test Accuracy: 0.734 Epoch: 37/45.. Training Loss: 0.683.. Test Loss: 0.759.. Test Accuracy: 0.735 Epoch: 38/45.. Training Loss: 0.687.. Test Loss: 0.755.. Test Accuracy: 0.741 Epoch: 39/45.. Training Loss: 0.675.. Test Loss: 0.742.. Test Accuracy: 0.746 Epoch: 40/45.. Training Loss: 0.670.. Test Loss: 0.745.. Test Accuracy: 0.742 Epoch: 41/45.. Training Loss: 0.666.. Test Loss: 0.748.. Test Accuracy: 0.742 Epoch: 42/45.. Training Loss: 0.661.. Test Loss: 0.754.. Test Accuracy: 0.746 Epoch: 43/45.. Training Loss: 0.655.. Test Loss: 0.751.. Test Accuracy: 0.746 Epoch: 44/45.. Training Loss: 0.648.. Test Loss: 0.748.. Test Accuracy: 0.743 Epoch: 45/45.. Training Loss: 0.643.. Test Loss: 0.753.. Test Accuracy: 0.747
Plot the training loss (and validation loss/accuracy, if recorded).
# somehow the original code (commented below) makes my kernel crach systemtically, so I re-wrote it using plotly instead.
# plt.plot(train_losses, label='Training loss')
# plt.plot(test_losses, label='Validation loss')
# plt.legend(frameon=False)
# Create the line plot
fig = go.Figure()
fig.add_trace(go.Scatter(x=list(range(len(train_losses))), y=train_losses, name='Training loss'))
fig.add_trace(go.Scatter(x=list(range(len(test_losses))), y=test_losses, name='Validation loss'))
# Update layout
fig.update_layout(showlegend=True)
# Show the plot
fig.show()
Using the previously created DataLoader for the test set, compute the percentage of correct predictions using the highest probability prediction.
If your accuracy is over 70%, great work! This is a hard task to exceed 70% on.
If your accuracy is under 45%, you'll need to make improvements. Go back and check your model architecture, loss function, and optimizer to make sure they're appropriate for an image classification task.
#see above, accuracy >70%.
Using torch.save, save your model for future loading.
checkpoint = {'state_dict': model.state_dict()}
torch.save(checkpoint, 'checkpoint.pth')
My model achieved over 74% accuracy, much lower than the state-of-the-art models, but higher than the Detectocorp’s algorithm. Given that, and the fact that my model is relatively simple and was trained pretty quickly, I would definitely recommend building the model.
My model has a total of 6 layers and 3 activation functions. Suggestions of paths to explore:
Increase the depth of the network: more convolutional layers can be added to capture more complex features. Risk: overfitting.
Adjust the kernel size and stride: Smaller kernel sizes can capture finer details, while larger kernel sizes can capture more global features.
Try different activation functions: LeakyReLU, ELU, SELU, etc.
Add regularization techniques: Regularization techniques like dropout can help prevent overfitting and improve generalization.
Increase the number of filters: You can increase the number of filters in the convolutional layers to capture more diverse features. Risk: computational cost and potential overfitting.
Adjust the learning rate and optimizer